import SetupEnv from './common/setup-env.mdx';
import { PackageManagerTabs } from '@theme';

# Integrate with Playwright

[Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, mainly used for end-to-end testing and web scraping of web applications.

There are two ways to integrate with Playwright:

- Directly integrate and call the Midscene Agent via script, suitable for quick prototyping, data scraping, and automation scripts.
- Integrate Midscene into Playwright test cases, suitable for UI testing scenarios.

<SetupEnv />

## Direct integration with Midscene agent

:::info Example Project
You can find an example project of direct Playwright integration here: [https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
:::

### Step 1: Install dependencies

<PackageManagerTabs command="install @midscene/web playwright @playwright/test tsx --save-dev" />

### Step 2: Write the script

Save the following code as `./demo.ts`:

```typescript
import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';
import 'dotenv/config'; // read environment variables from .env file

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

Promise.resolve(
  (async () => {
    const browser = await chromium.launch({
      headless: true, // 'true' means we can't see the browser window
      args: ['--no-sandbox', '--disable-setuid-sandbox'],
    });

    const page = await browser.newPage();
    await page.setViewportSize({
      width: 1280,
      height: 768,
    });
    await page.goto('https://www.ebay.com');
    await sleep(5000); // 👀 init Midscene agent
    const agent = new PlaywrightAgent(page);

    // 👀 type keywords, perform a search
    await agent.aiAction('type "Headphones" in search box, hit Enter');

    // 👀 wait for the loading
    await agent.aiWaitFor('there is at least one headphone item on page');
    // or you may use a plain sleep:
    // await sleep(5000);

    // 👀 understand the page content, find the items
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find item in list and corresponding price',
    );
    console.log('headphones in stock', items);

    const isMoreThan1000 = await agent.aiBoolean(
      'Is the price of the headphones more than 1000?',
    );
    console.log('isMoreThan1000', isMoreThan1000);

    const price = await agent.aiNumber(
      'What is the price of the first headphone?',
    );
    console.log('price', price);

    const name = await agent.aiString(
      'What is the name of the first headphone?',
    );
    console.log('name', name);

    const location = await agent.aiLocate(
      'What is the location of the first headphone?',
    );
    console.log('location', location);

    // 👀 assert by AI
    await agent.aiAssert('There is a category filter on the left');

    // 👀 click on the first item
    await agent.aiTap('the first item in the list');

    await browser.close();
  })(),
);
```

For more Agent API details, please refer to [API Reference](./api.mdx).

### Step 3: Run the script

Use `tsx` to run, and you will see the product information printed in the terminal:

```bash
# run
npx tsx demo.ts

# The terminal should output something like:
#  [
#   {
#     itemTitle: 'JBL Tour Pro 2 - True wireless Noise Cancelling earbuds with Smart Charging Case',
#     price: 551.21
#   },
#   {
#     itemTitle: 'Soundcore Space One Wireless Headphones 40H ANC Playtime 2XStronger Voice',
#     price: 543.94
#   }
# ]
```

### Step 4: View the run report

After the above command executes successfully, it will output: `Midscene - report file updated: /path/to/report/some_id.html`. Open this file in your browser to view the report.

### How to force navigation in the current tab

If you want to force the page to open in the current tab (for example, clicking a link with `target="_blank"`), you can set the `forceSameTabNavigation` option to `true`:

```typescript
const mid = new PlaywrightAgent(page, {
  forceSameTabNavigation: true,
});
```

## Integration in Playwright test cases

Here we assume you already have a repository with Playwright integration.

:::info Example Project
You can find an example project of Playwright test integration here: [https://github.com/web-infra-dev/midscene-example/blob/main/playwright-testing-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-testing-demo)
:::

### Step 1: Add dependencies and update configuration

Add dependencies

<PackageManagerTabs command="install @midscene/web --save-dev" />

Update playwright.config.ts

```diff
export default defineConfig({
  testDir: './e2e',
+ timeout: 90 * 1000,
+ reporter: [["list"], ["@midscene/web/playwright-reporter", { type: "merged" }]], // type optional, default is "merged", means multiple test cases generate one report, optional value is "separate", means one report for each test case
});
```

The `type` option of the `reporter` configuration can be `merged` or `separate`. The default value is `merged`, which indicates that one merged report for all test cases; the optional value is `separate`, indicating that the report is separated for each test case.

### Step 2: Extend the `test` instance

Save the following code as `./e2e/fixture.ts`:

```typescript
import { test as base } from '@playwright/test';
import type { PlayWrightAiFixtureType } from '@midscene/web/playwright';
import { PlaywrightAiFixture } from '@midscene/web/playwright';

export const test = base.extend<PlayWrightAiFixtureType>(
  PlaywrightAiFixture({
    waitForNetworkIdleTimeout: 2000, // optional, the timeout for waiting for network idle between each action, default is 2000ms
  }),
);
```

### Step 3: Write test cases

#### Basic AI operation APIs

- `ai` or `aiAction` - General AI interaction
- `aiTap` - Click operation
- `aiHover` - Hover operation
- `aiInput` - Input operation
- `aiKeyboardPress` - Keyboard operation
- `aiScroll` - Scroll operation
- `aiRightClick` - Right-click operation

#### Query

- `aiAsk` - Ask AI Model anything about the current page
- `aiQuery` - Extract structured data from current page
- `aiNumber` - Extract number from current page
- `aiString` - Extract string from current page
- `aiBoolean` - Extract boolean from current page

#### More APIs

- `aiAssert` - AI Assertion
- `aiWaitFor` - AI Wait
- `aiLocate` - Locate Element
- `runYaml` - Execute YAML automation script
- `setAIActionContext` - Set AI action context, send background knowledge to AI model when calling `agent.aiAction()`
- `evaluateJavaScript` - Execute JavaScript in page context
- `logScreenshot` - Log screenshot with description in report file
- `freezePageContext` - Freeze page context
- `unfreezePageContext` - Unfreeze page context

Besides the exposed shortcut methods, if you need to call other [API](./api.mdx) provided by the agent, you can use `agentForPage` to get the `PageAgent` instance, and use `PageAgent` to call the API for interaction:

```typescript
test('case demo', async ({ agentForPage, page }) => {
  const agent = await agentForPage(page);

  await agent.logScreenshot();
  const logContent = agent._unstableLogContent();
  console.log(logContent);
});
```

#### Example code

```typescript title="./e2e/ebay-search.spec.ts"
import { expect } from '@playwright/test';
import { test } from './fixture';

test.beforeEach(async ({ page }) => {
  page.setViewportSize({ width: 400, height: 905 });
  await page.goto('https://www.ebay.com');
  await page.waitForLoadState('networkidle');
});

test('search headphone on ebay', async ({
  ai,
  aiQuery,
  aiAssert,
  aiInput,
  aiTap,
  aiScroll,
  aiWaitFor,
  aiRightClick,
  logScreenshot,
}) => {
  // Use aiInput to enter search keyword
  await aiInput('Headphones', 'search box');

  // Use aiTap to click search button
  await aiTap('search button');

  // Wait for search results to load
  await aiWaitFor('search results list loaded', { timeoutMs: 5000 });

  // Use aiScroll to scroll to bottom
  await aiScroll(
    {
      direction: 'down',
      scrollType: 'untilBottom',
    },
    'search results list',
  );

  // Use aiQuery to get product information
  const items = await aiQuery<Array<{ title: string; price: number }>>(
    'get product titles and prices from search results',
  );

  console.log('headphones in stock', items);
  expect(items?.length).toBeGreaterThan(0);

  // Use aiAssert to verify filter functionality
  await aiAssert('category filter exists on the left side');

  // Use logScreenshot to capture the current state
  await logScreenshot('Search Results', {
    content: 'Final search results for headphones',
  });
});
```

For more Agent API details, please refer to [API Reference](./api.mdx).

### Step 4. Run test cases

```bash
npx playwright test ./e2e/ebay-search.spec.ts
```

### Step 5. View test report

After the command executes successfully, it will output: `Midscene - report file updated: ./current_cwd/midscene_run/report/some_id.html`. Open this file in your browser to view the report.


## Provide custom actions

Use the `customActions` option to extend the agent's action space with your own actions defined via `defineAction`. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.

```typescript
import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // carry out your clicking logic using locate + count
  },
});

const agent = new PlaywrightAgent(page, {
  customActions: [ContinuousClick],
});

await agent.aiAction('click the red button five times');
```

Check [Integrate with any interface](./integrate-with-any-interface#define-a-custom-action) for more details about defining custom actions.

## More

- For all the methods on the Agent, please refer to [API Reference](./api.mdx).
- For more details about prompting, please refer to [Prompting Tips](./prompting-tips)
